Summary

This presents the renewed analysis of Cryptococcus neoformans start codon usage and context. This uses the best-transcript annotation and corresponding start codon position and sequence map made by Corinne Maufrais in June 2018.

It covers both JEC21 and H99 data. First several analyses on JEC21, then the same analyses on H99, then a joint analysis of signals conserved across both strains.

We check consensus sequences for both “narrow” (NNNNNATG) and “wide” (NNNNNNNNNATGNNN) neighbourhoods of the start codon, and find essentially the same results with both, comparing annotated aATGs to downstream dATGs. Then for the following analyses we use the narrow score.

Load Packages

JEC21 first

Expression: RNA abundance and ribosome-protected-fragments

Load expression data

## # A tibble: 6,639 x 4
## # Groups:   Gene [6,639]
##    Gene        RNA    RPF    TE
##    <chr>     <dbl>  <dbl> <dbl>
##  1 CNM01300  4001. 18299. 4.57 
##  2 CNM01080  8388.  8973. 1.07 
##  3 CNA07570  5785.  7163. 1.24 
##  4 CNG04360  3188.  7062. 2.22 
##  5 CNB02360  3708.  6972. 1.88 
##  6 CNA06350 15095.  6604. 0.437
##  7 CNC00700  2357.  6232. 2.64 
##  8 CNF03840 11379.  6188. 0.544
##  9 CNF02150 15472.  6159. 0.398
## 10 CNF03160  5121.  6125. 1.20 
## # ... with 6,629 more rows

We also calculated hiTrans_JEC21, the top 5% (330) translated genes by RPF TPM.

Ribosome occupancy mostly tracks RNA abundance

ATG Context

Load context data

## # A tibble: 6,639 x 19
##    Gene   aATG.context   aATG.pos d1.context  d1.posTSS d1.posATG d1.frame
##    <chr>  <chr>             <int> <chr>           <int>     <int>    <int>
##  1 CNA00… GACCCCCTTGTTA…       93 ATAGCTGGTC…       226      -133        1
##  2 CNA00… ATATTGCCTGAGA…      102 GTCCACCTTA…       163       -61        1
##  3 CNA00… GAACTATCAAGCA…      214 GAGGCTCCGC…       512      -298        1
##  4 CNA00… ATTTTCAACAGCA…       81 AGCAATATAC…       307      -226        1
##  5 CNA00… ACCGTGCACACCA…       76 GTATTCGGGG…       106       -30        0
##  6 CNA00… AATCATACCAAAA…      117 GCCCCTATCT…       186       -69        0
##  7 CNA00… CCGACTATAAAAA…       52 AACCGTGCTA…       112       -60        0
##  8 CNA00… CTTTCTCTTCAGA…       77 TGCTATAGCA…        98       -21        0
##  9 CNA00… TAATCACACAAGA…      330 CTCATCATCA…       391       -61        1
## 10 CNA00… AAAAAAAACGCGA…      146 ACTTGTCGAC…       184       -38        2
## # ... with 6,629 more rows, and 12 more variables: d2.context <chr>,
## #   d2.posTSS <int>, d2.posATG <int>, d2.frame <int>, u1.context <chr>,
## #   u1.posTSS <int>, u1.posATG <int>, u1.frame <int>, u2.context <chr>,
## #   u2.posTSS <int>, u2.posATG <int>, u2.frame <int>

Annotated ATGs have a Kozak consensus sequence

Highly translated Annotated ATGs have a Kozak consensus sequence

That’s for hiTrans_JEC21, the top 5% (330) translated genes by RPF TPM.

Upstream ATGs don’t have a consensus

First upstream ATG.

Downstream ATGs don’t have a consensus

First downstream ATG

Downstream ATGs in frame and highly translated don’t have a consensus

Except for 3rd-codon-position bias.

Calculate Information content and scores of consensus motif

Calculate a wide and a narrow consensus sequence

Calculate motif score against the position weight matrix (pwm) for both narrow (-5 from ATG through to ATG) and wide (-9 from ATG to +3) kozak consensus motif. These motifs are taken from the top 5% highly translated genes.

Estimate the information content

Using the sequence logo, details on https://en.wikipedia.org/wiki/Sequence_logo

Information content in bits of highly-translated consensus (excluding 6 bits from ATG), narrow is 3.01, of wide is 4.71.

Calculate scores of aATG, dATG, uATG against Kozak consensus

We calculate scores using Biostrings::PWMscoreStartingAt.

The best description I could find of this method is: https://support.bioconductor.org/p/61520/

It is just the sum of the matrix product of the PWM with the sequence.

Write scores to file scores_kozak_JEC21.txt.

## # A tibble: 6,639 x 11
##    Gene     aATG.scorekn d1.scorekn u1.scorekn aATG.scorekw d1.scorekw
##    <chr>           <dbl>      <dbl>      <dbl>        <dbl>      <dbl>
##  1 CNA00010        0.737      0.723      0.968        0.662      0.707
##  2 CNA00020        0.814      0.934      0.868        0.735      0.892
##  3 CNA00030        0.875      0.698      0.804        0.874      0.689
##  4 CNA00040        0.887      0.802     NA            0.863      0.772
##  5 CNA00050        0.955      0.707     NA            0.839      0.707
##  6 CNA00060        1.000      0.889     NA            0.936      0.792
##  7 CNA00070        0.977      0.809     NA            0.893      0.670
##  8 CNA00075        0.791      0.922      0.735        0.799      0.835
##  9 CNA00080        0.933      0.923     NA            0.890      0.886
## 10 CNA00090        0.848      0.781     NA            0.875      0.699
## # ... with 6,629 more rows, and 5 more variables: u1.scorekw <dbl>,
## #   d1vsan <dbl>, u1vsan <dbl>, d1vsaw <dbl>, u1vsaw <dbl>

Plot against narrow consensus (-5 to ATG)

Plot against wide consensus (-9 to +3 from ATG)

Compare aATG and dATG context by gene

Most dATG scores are less than aATG scores

Most u1ATG scores are less than aATG scores

For highly translated genes, most dATG scores are much less than aATG

Red: high dATG vs aATG Kozak score. Blue: highly translated. Purple: both.

Genes with unusual dATG vs aATG narrow score

Those genes are in this list:

## # A tibble: 330 x 3
##    Gene     aATG.scorekn d1.scorekn
##    <chr>           <dbl>      <dbl>
##  1 CNI00340        0.675      0.967
##  2 CNK00900        0.708      0.999
##  3 CNA01530        0.675      0.956
##  4 CNI00670        0.710      0.990
##  5 CNB01880        0.625      0.894
##  6 CNK02980        0.700      0.968
##  7 CNL05790        0.642      0.909
##  8 CNA07070        0.645      0.911
##  9 CNN00160        0.724      0.989
## 10 CNB00790        0.736      0.999
## # ... with 320 more rows

dATG vs aATG ribosome occupancy depends on the context

For top 3315 / 50% of genes by mean RNA TPM.

dATG vs aATG ribosome occupancy depends on the context, geometric mean across reps

There is slight enrichment in high-score dATGs in frame near the aATG

Compare score difference to localization predictions

Load predictions from mitofates

In input file JEC21_mitofates_26June2018.txt.

Genes with high dATG vs aATG score are enriched in mitochondrial presequences

However, mito-localized genes do not have a distinctive aATG context

It’s just a subset: the dual-localized ones.

uATGs inhibit translation of the main ORF

uATGs are associated with lower absolute translation

uATGs are associated with lower translation efficiency

uATG vs aATG ribosome occupancy depends on the context

For top 3315 / 50% of genes by mean RNA TPM.

Back to table of contents

H99 second

Expression: RNA abundance and ribosome-protected-fragments

Load expression data

## # A tibble: 6,797 x 4
## # Groups:   Gene [6,797]
##    Gene          RNA    RPF    TE
##    <chr>       <dbl>  <dbl> <dbl>
##  1 CNAG_06125 10279. 20179. 1.96 
##  2 CNAG_06101  8672.  8471. 0.977
##  3 CNAG_05762  7396.  7528. 1.02 
##  4 CNAG_00779  3861.  7368. 1.91 
##  5 CNAG_03127  6257.  7184. 1.15 
##  6 CNAG_04011 13313.  6786. 0.510
##  7 CNAG_06222  6401.  6784. 1.06 
##  8 CNAG_01455 12683.  6580. 0.519
##  9 CNAG_05525  6978.  6456. 0.925
## 10 CNAG_03739  6383.  6394. 1.00 
## # ... with 6,787 more rows

We also calculated hiTrans_H99, the top 5% (330) translated genes by RPF TPM.

Ribosome occupancy mostly tracks RNA abundance

ATG Context

Load context data

## # A tibble: 6,797 x 19
##    Gene   aATG.context   aATG.pos d1.context  d1.posTSS d1.posATG d1.frame
##    <chr>  <chr>             <int> <chr>           <int>     <int>    <int>
##  1 CNAG_… TACTTACGCGACA…       70 AAATTCACTT…       100       -30        0
##  2 CNAG_… GAACTTCGATCAA…       52 TCTCCCGCCA…       114       -62        2
##  3 CNAG_… GTAGACTTACCTA…      346 CACGGGCATC…       395       -49        1
##  4 CNAG_… CACATACGTAACA…      214 CCGAACGGCG…       256       -42        0
##  5 CNAG_… GACTATACAAAAA…       55 GGAGGTGGGC…       163      -108        0
##  6 CNAG_… AACCATACAAAAA…       99 CAAAGCCATT…       259      -160        1
##  7 CNAG_… ACCGTGCACACCA…       75 GTATTCGGAA…       105       -30        0
##  8 CNAG_… GTTTTCAACAGCA…       73 CCCATCAGAA…       380      -307        1
##  9 CNAG_… GTACTATTGAACA…      206 GAGGCTCCGC…       513      -307        1
## 10 CNAG_… TACAAGCTTGAAA…       90 GGCCGCCTTA…       151       -61        1
## # ... with 6,787 more rows, and 12 more variables: d2.context <chr>,
## #   d2.posTSS <int>, d2.posATG <int>, d2.frame <int>, u1.context <chr>,
## #   u1.posTSS <int>, u1.posATG <int>, u1.frame <int>, u2.context <chr>,
## #   u2.posTSS <int>, u2.posATG <int>, u2.frame <int>

Annotated ATGs have a Kozak consensus sequence

Highly translated Annotated ATGs have a Kozak consensus sequence

That’s for hiTrans_H99, the top 5% (330) translated genes by RPF TPM.

Upstream ATGs don’t have a consensus

First upstream ATG.

Downstream ATGs don’t have a consensus

First downstream ATG

Downstream ATGs in frame and highly translated don’t have a consensus

Except for 3rd-codon-position bias.

Calculate Information content and scores of consensus motif

Calculate a wide and a narrow consensus sequence

Calculate motif score against the position weight matrix (pwm) for both narrow (-5 from ATG through to ATG) and wide (-9 from ATG to +3) kozak consensus motif. These motifs are taken from the top 5% highly translated genes.

Estimate the information content

Using the sequence logo details on https://en.wikipedia.org/wiki/Sequence_logo

Information content in bits of highly-translated consensus (excluding 6 bits from ATG), narrow is 3.12, of wide is 4.9.

Calculate scores of aATG, dATG, uATG against Kozak consensus

Write scores to file scores_kozak_H99.txt.

## # A tibble: 6,797 x 11
##    Gene       aATG.scorekn d1.scorekn u1.scorekn aATG.scorekw d1.scorekw
##    <chr>             <dbl>      <dbl>      <dbl>        <dbl>      <dbl>
##  1 CNAG_00002        0.866      0.784      0.853        0.834      0.687
##  2 CNAG_00003        0.833      0.847     NA            0.792      0.808
##  3 CNAG_00004        0.794      0.795     NA            0.759      0.665
##  4 CNAG_00005        0.880      0.752      0.855        0.892      0.659
##  5 CNAG_00006        0.978      0.727     NA            0.872      0.624
##  6 CNAG_00007        0.978      0.708     NA            0.916      0.662
##  7 CNAG_00008        0.960      0.819     NA            0.847      0.798
##  8 CNAG_00009        0.876      0.798     NA            0.864      0.783
##  9 CNAG_00010        0.896      0.796      0.822        0.849      0.756
## 10 CNAG_00011        0.878      0.937      0.691        0.743      0.862
## # ... with 6,787 more rows, and 5 more variables: u1.scorekw <dbl>,
## #   d1vsan <dbl>, u1vsan <dbl>, d1vsaw <dbl>, u1vsaw <dbl>

Plot against narrow consensus (-5 to ATG)

Plot against wide consensus (-9 to +3 from ATG)

Compare aATG and dATG context by gene

Most dATG scores are less than aATG scores

Most u1ATG scores are less than aATG scores

For highly translated genes, most dATG scores are much less than aATG

Genes with unusual dATG / aATG narrow score

Those genes are in this list:

## # A tibble: 330 x 3
##    Gene       aATG.scorekn d1.scorekn
##    <chr>             <dbl>      <dbl>
##  1 CNAG_07473        0.606      0.969
##  2 CNAG_04147        0.644      0.984
##  3 CNAG_04764        0.641      0.948
##  4 CNAG_07801        0.675      0.978
##  5 CNAG_02259        0.676      0.978
##  6 CNAG_03953        0.692      0.991
##  7 CNAG_07776        0.700      0.993
##  8 CNAG_06278        0.699      0.991
##  9 CNAG_00165        0.675      0.957
## 10 CNAG_04179        0.709      0.991
## # ... with 320 more rows

dATG vs aATG ribosome occupancy depends on the context

For top 3315 / 50% of genes by mean RNA TPM.

dATG vs aATG ribosome occupancy and score, geometric mean across reps

Compare score difference to localization predictions

Load predictions from mitofates

In input file H99_mitofates_26June2018.txt.

Genes with high dATG vs aATG score are enriched in mitochondrial presequences

However, mito-localized genes do not have a distinctive aATG context

It’s just a subset: the dual-localized ones.

uATGs inhibit translation of the main aORF

uATGs are associated with lower absolute translation

uATGs are associated with lower translation efficiency

uATG vs aATG ribosome occupancy depends on the context

For top 3315 / 50% of genes by mean RNA TPM.

Back to table of contents

Results on conserved genes in H99 and JEC21.

Load list of paralogs

From 2016 Paper.

## # A tibble: 6,341 x 2
##    H99        JEC21   
##    <chr>      <chr>   
##  1 CNAG_01397 CND05080
##  2 CNAG_07825 CNH03545
##  3 CNAG_05539 CNH01890
##  4 CNAG_03635 CNB01365
##  5 CNAG_06621 CNF03970
##  6 CNAG_00830 CNA08090
##  7 CNAG_07556 CNK01100
##  8 CNAG_06796 CNB00060
##  9 CNAG_06009 CNM00180
## 10 CNAG_03522 CNG00710
## # ... with 6,331 more rows

Conservation of gene expression

RNA Abundance

Ribosome Occupancy

Translation efficiency, no threshold

Translation efficiency, filtered by top 50% of expression

Genes with high translation

## # A tibble: 20 x 8
##    H99        JEC21    RNA.H99 RPF.H99 TE.H99 RNA.JEC21 RPF.JEC21 TE.JEC21
##    <chr>      <chr>      <dbl>   <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
##  1 CNAG_06125 CNM01300  10279.  20179.  1.96      4001.    18299.    4.57 
##  2 CNAG_06101 CNM01080   8672.   8471.  0.977     8388.     8973.    1.07 
##  3 CNAG_00779 CNA07570   3861.   7368.  1.91      5785.     7163.    1.24 
##  4 CNAG_03127 CNG04360   6257.   7184.  1.15      3188.     7062.    2.22 
##  5 CNAG_05762 CNF02150   7396.   7528.  1.02     15472.     6159.    0.398
##  6 CNAG_03739 CNB02360   6383.   6394.  1.00      3708.     6972.    1.88 
##  7 CNAG_06222 CNM02240   6401.   6784.  1.06      4442.     6022.    1.36 
##  8 CNAG_00655 CNA06350  12493.   6052.  0.484    15095.     6604.    0.437
##  9 CNAG_04011 CNB04930  13313.   6786.  0.510    19956.     5660.    0.284
## 10 CNAG_06633 CNF03840   8926.   6146.  0.689    11379.     6188.    0.544
## 11 CNAG_01332 CND04480   5928.   6096.  1.03      4661.     5987.    1.28 
## 12 CNAG_03015 CNC00700   4860.   5704.  1.17      2357.     6232.    2.64 
## 13 CNAG_04448 CNI01090   6660.   5962.  0.895     5374.     5900.    1.10 
## 14 CNAG_00771 CNA07490   6830.   5879.  0.861     6812.     5968.    0.876
## 15 CNAG_00640 CNA06200   7706.   5784.  0.751     4908.     6048.    1.23 
## 16 CNAG_04883 CNJ03110   4275.   5882.  1.38      5777.     5917.    1.02 
## 17 CNAG_04726 CNJ01560   7970.   6377.  0.800     6365.     5406.    0.849
## 18 CNAG_00672 CNA06500   9202.   6072.  0.660    14347.     5654.    0.394
## 19 CNAG_05525 CNH01770   6978.   6456.  0.925     4071.     5212.    1.28 
## 20 CNAG_03780 CNB02750   6631.   5710.  0.861     5014.     5892.    1.18

Genes with high TE

## # A tibble: 20 x 8
##    H99        JEC21    RNA.H99 RPF.H99 TE.H99 RNA.JEC21 RPF.JEC21 TE.JEC21
##    <chr>      <chr>      <dbl>   <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
##  1 CNAG_01130 CND02530    47.7   301.    6.31      32.7      298.     9.11
##  2 CNAG_01890 CNK02310   248.   1447.    5.84     280.      1905.     6.79
##  3 CNAG_06150 CNM01520   607.   3357.    5.53     540.      2932.     5.43
##  4 CNAG_02994 CNC06020    68.6   262.    3.81      31.3      219.     6.98
##  5 CNAG_01750 CNC02520   257.   1309.    5.10     312.      1659.     5.31
##  6 CNAG_04327 CNI02220    44.7   202.    4.52      34.8      197.     5.66
##  7 CNAG_01727 CNC02320   737.   3578.    4.86     710.      3755.     5.29
##  8 CNAG_01744 CNC02470   104.    316.    3.04      33.7      238.     7.05
##  9 CNAG_01117 CND02420   437.   2100.    4.81     441.      2279.     5.17
## 10 CNAG_05907 CNF00650    94.5   298.    3.15      66.5      451.     6.78
## 11 CNAG_04640 CNJ00800   217.    823.    3.80     159.       971.     6.09
## 12 CNAG_04313 CNI02360   205.    225.    1.10      31.0      254.     8.20
## 13 CNAG_07373 CNA06000    65.8   302.    4.59      78.2      355.     4.53
## 14 CNAG_05602 CNH02450   382.    676.    1.77      27.6      192.     6.97
## 15 CNAG_06840 CND06220  1197.   2883.    2.41     463.      2900.     6.27
## 16 CNAG_00136 CNA01230    46.4   197.    4.24      45.6      200.     4.39
## 17 CNAG_05884 CNF00890    79.8   295.    3.69      73.2      361.     4.93
## 18 CNAG_06208 CNM02070   251.    977.    3.89     232.       988.     4.26
## 19 CNAG_00992 CND01200   254.    891.    3.51     263.      1170.     4.45
## 20 CNAG_04659 CNJ00950    25.8    55.7   2.16      26.4      152.     5.78

Genes with low TE

To-do: Check which of these have uATGs.

## # A tibble: 20 x 8
##    H99        JEC21   RNA.H99 RPF.H99  TE.H99 RNA.JEC21 RPF.JEC21 TE.JEC21
##    <chr>      <chr>     <dbl>   <dbl>   <dbl>     <dbl>     <dbl>    <dbl>
##  1 CNAG_07888 CNH025…  2649.    2.58  9.75e-4     582.      0.470 0.000806
##  2 CNAG_07695 CNF003…   164.    5.72  3.48e-2     183.      3.21  0.0176  
##  3 CNAG_03140 CNG042…   188.    2.03  1.08e-2     124.      5.51  0.0442  
##  4 CNAG_05574 CNH022…    30.1   2.58  8.59e-2      42.6     1.67  0.0391  
##  5 CNAG_04855 CNJ027…    30.4   2.66  8.75e-2      88.2     6.70  0.0760  
##  6 CNAG_06614 CNF040…    41.8   4.48  1.07e-1      58.1     4.58  0.0789  
##  7 CNAG_01603 CNC011…    52.5   3.41  6.49e-2      25.0     3.04  0.122   
##  8 CNAG_02323 CNE022…    39.1   3.34  8.52e-2      50.2     5.12  0.102   
##  9 CNAG_03578 CNG002…    43.5   6.10  1.40e-1      58.6     4.37  0.0745  
## 10 CNAG_07813 CNL049…   148.   20.0   1.35e-1     204.     17.4   0.0850  
## 11 CNAG_06246 CNM024…   196.   24.6   1.25e-1     172.     17.0   0.0990  
## 12 CNAG_05319 CNH031…    35.4   0.588 1.66e-2      35.6     7.71  0.217   
## 13 CNAG_00784 CNA076…    52.8   6.67  1.26e-1      50.1     5.86  0.117   
## 14 CNAG_08027 CNH020…    25.6   2.64  1.03e-1      90.3    13.6   0.151   
## 15 CNAG_02433 CNE012…    39.1   6.71  1.72e-1     115.     11.1   0.0965  
## 16 CNAG_00529 CNA051…    40.2   7.94  1.98e-1      97.3     7.85  0.0806  
## 17 CNAG_05237 CNL039…    33.0   9.10  2.76e-1      72.3     0.278 0.00384 
## 18 CNAG_05288 CNH034…    56.3   8.89  1.58e-1      70.4     8.85  0.126   
## 19 CNAG_01624 CNC013…    34.4   5.03  1.46e-1      30.5     4.37  0.143   
## 20 CNAG_02867 CNC048…    54.6   7.30  1.34e-1      47.8     7.51  0.157

Genes with dATG score high relative to aATG score

We take transcripts where the overall gene expression (RNA abundance in top 50%), the difference in score (dATG > aATG in top 5%), and the dATG frame are all conserved between H99 and JEC21.

dATG in frame with ATG

Saved to file dvsaATG_highdiffn_inframe_cc.txt.

## # A tibble: 44 x 6
##    H99        JEC21    a.skn.H99 d.skn.H99 a.skn.JEC21 d.skn.JEC21
##    <chr>      <chr>        <dbl>     <dbl>       <dbl>       <dbl>
##  1 CNAG_07473 CNB01880     0.606     0.969       0.625       0.894
##  2 CNAG_07776 CNI00670     0.700     0.993       0.710       0.990
##  3 CNAG_00165 CNA01530     0.675     0.957       0.675       0.956
##  4 CNAG_02545 CNE00210     0.666     0.940       0.682       0.943
##  5 CNAG_07801 CNL06190     0.675     0.978       0.685       0.904
##  6 CNAG_01544 CNC06400     0.727     0.978       0.723       0.978
##  7 CNAG_03953 CNB04410     0.692     0.991       0.741       0.947
##  8 CNAG_04179 CNI03160     0.709     0.991       0.725       0.947
##  9 CNAG_05722 CNF02520     0.668     0.912       0.675       0.918
## 10 CNAG_02431 CNE01260     0.727     0.999       0.731       0.943
## 11 CNAG_02880 CNC04930     0.666     0.912       0.682       0.911
## 12 CNAG_03996 CNB04810     0.666     0.899       0.625       0.848
## 13 CNAG_00517 CNA04990     0.732     0.934       0.702       0.942
## 14 CNAG_02259 CNE02870     0.676     0.978       0.695       0.835
## 15 CNAG_03396 CNG01890     0.632     0.861       0.642       0.854
## 16 CNAG_00086 CNA00760     0.736     0.957       0.742       0.956
## 17 CNAG_07873 CNH00360     0.733     0.948       0.749       0.946
## 18 CNAG_04219 CNI03610     0.779     0.978       0.771       0.978
## 19 CNAG_04604 CNJ00430     0.789     0.990       0.797       0.990
## 20 CNAG_00026 CNA00190     0.726     0.957       0.720       0.877
## # ... with 24 more rows

dATG out of frame

Saved to file dvsaATG_highdiffn_outframe_cc.txt.

## # A tibble: 14 x 6
##    H99        JEC21    a.skn.H99 d.skn.H99 a.skn.JEC21 d.skn.JEC21
##    <chr>      <chr>        <dbl>     <dbl>       <dbl>       <dbl>
##  1 CNAG_06278 CNN00160     0.699     0.991       0.724       0.989
##  2 CNAG_04054 CNB05380     0.717     0.978       0.723       0.978
##  3 CNAG_02894 CNC05065     0.783     0.978       0.798       0.978
##  4 CNAG_06006 CNM00150     0.715     0.934       0.781       0.933
##  5 CNAG_02809 CNC04270     0.801     0.978       0.792       0.978
##  6 CNAG_03008 CNC06190     0.827     0.999       0.817       1    
##  7 CNAG_03370 CNG02120     0.803     0.969       0.795       0.968
##  8 CNAG_01667 CNC01780     0.842     0.990       0.837       0.990
##  9 CNAG_00784 CNA07610     0.692     0.842       0.696       0.837
## 10 CNAG_02578 CNK00690     0.826     0.945       0.791       0.944
## 11 CNAG_01241 CND03590     0.757     0.883       0.761       0.897
## 12 CNAG_07780 CNI00090     0.876     0.999       0.867       1    
## 13 CNAG_01270 CND03900     0.668     0.791       0.675       0.791
## 14 CNAG_03839 CNB03280     0.824     0.941       0.822       0.942
  • CNN00160 two-component-like sensor kinase TCO7
  • CNB05380 SUI1/eIF1, translation initiation factor.
  • CNC06190 has a domain conserved with eIF2.
  • CNG02120 freqenin calcium-binding protein, FRQ1 homolog
  • CNC05065/CNM00150/CNC04270/CNC06190/CNA07610/CND03900, all hypothetical or uncharacterized.
  • CNC01780 prenyltransferase, COQ2 homolog
  • CNK00690 regulation of meiosis-related, PCH2 homolog
  • CND03590 protein phosphatase I nuclear regulatory subunit, SDS22 homolog
  • CNI00090 farnesyltranstransferase, BTS1 homolog
  • CND03900 has a W2 eIF4-gamma/eIF5/eIF2-epsilon - like domain
  • CNB03280 mRNA transcription modulator, CCR4-NOT ubiquitin ligase subunit MOT2 homolog.

GO analysis of conserved genelists

This was done on 25th June, with values generated by CryptoATGcontext then. Not a reproducible analysis here!

I performed GO analysis with PANTHER.db on JEC21 gene names. PANTHER version 13.1 Released 2018-02-03, Overrepresentation test on GOslim terms.

Link: http://www.pantherdb.org/tools/compareToRefList.jsp

dATG score high compared aATG and dATG out of frame, 14 genes

File dvsaATG_highdiffn_outframe_cc.txt.

No significant GO terms.

dATG score high compared aATG and dATG in frame, 44 genes

File dvsaATG_highdiffn_inframe_cc.txt.

Enriched in Biological processes:

  • tRNA aminoacylation for protein translation< translation < protein metabolic process < primary metabolic process < metabolic process
  • tRNA metabolic process < RNA metabolic process < nucleobase-containing compound metabolic process
  • Unclassified

Molecular Function:

  • aminoacyl-tRNA ligase activity < ligase activity < catalytic activity

Cellular Component:

  • cytosol < cytoplasm < intracellular < cell part

Highly translated, 291 genes

File hiTrans_cc.txt.

Enriched BPs include:

  • oxidative phosphorylation
  • glycolysis
  • translation
  • protein folding
  • cation transport
  • mitochondrial transport

Enriched MFs include:

  • transmembrane transporter activity
  • structural constituent of ribosome
  • translation elongation factor activity

Enriched CCs include:

  • proton-transporting ATP synthase complex
  • ribosome
  • mitochondrial inner membrane

High translation efficiency, 174 genes

File hiTE_cc.txt.

Enriched BPs include:

  • pentose-phosphate shunt < monosaccharide metabolic process
  • tRNA aminoacylation for protein translation < translation
  • acyl-CoA metabolic process
  • tricarboxylic acid cycle
  • cellular amino acid biosynthetic process
  • nuclear transport

Enriched MFs include:

  • translation elongation factor activity
  • translation initiation factor activity
  • aminoacyl-tRNA ligase activity

Enriched CCs include:

  • cytosol < cytoplasm

Low translation efficiency, 284 genes

File loTE_cc.txt.

Enriched BPs include:

  • anion transport

Enriched MFs, no sig. results.

Enriched CCs, no sig. results.

Back to table of contents